NMGMDA: a computational model for predicting potential microbe–drug associations based on minimize matrix nuclear norm and graph attention network

The prediction of potential microbe–drug associations is of great value for drug research and development, especially, methods, based on deep learning, have been achieved significant improvement in bio-medicine. In this manuscript, we proposed a novel computational model named NMGMDA based on the nuclear norm minimization and graph attention network to infer latent microbe–drug associations. Firstly, we created a heterogeneous microbe–drug network in NMGMDA by fusing the drug and microbe similarities with the established drug–microbe associations. After this, by using GAT and NNM to calculate the predict scores. Lastly, we created a fivefold cross validation framework to assess the new model NMGMDA's progressiveness. According to the simulation results, NMGMDA outperforms some of the most advanced methods, with a reliable AUC of 0.9946 on both MDAD and aBioflm databases. Furthermore, case studies on Ciprofloxacin, Moxifoxacin, HIV-1 and Mycobacterium tuberculosis were carried out in order to assess the effectiveness of NMGMDA even more. The experimental results demonstrated that, following the removal of known correlations from the database, 16 and 14 medications as well as 19 and 17 microbes in the top 20 predictions were validated by pertinent literature. This demonstrates the potential of our new model, NMGMDA, to reach acceptable prediction performance.

multi-core fusion model MKGNN based on Graph Convolutional Network(GCN).A deep neural network-based prediction model for microbe-drug associations called NNAN was created by Zhu et al. 18 A contrastive learning model called SCSMDA was created by Tian et al. 19 to forecast the connection between microbes and drugs.In order to anticipate probable microbe-drug correlations, Tan et al. 20 developed a computation technique termed GSAMDA based on the graph attention network and the sparse autoencoder.Yang et al. 21suggest a model, called MKGCN, for inferring microbe-drug associations based on Multiple Kernel Fusion on Graph Convolutional Network.Ma et al. 22 designed a microbe-drug prediction model based on graph attention network (GAT) and convolutional neural networks (CNN).
As mentioned above, it is easy to know that these neural network-based methods are frequently used in hiding random association prediction works, and among them, CNN-based approaches adopt the method of parameter sharing to effectively prevent overfitting, however, the pooling layer will lose a significant amount of important data during processing.As for the GCN-based approaches, although the non-matrix organized data will be more applicable, however, the scalability and flexibility are still quite limited.As for the GAT-based methods, although the clustering performance of graph neural networks can be significantly improved, but the clustering of higher-order neighborhoods is still a challenging task.Hence, it is clear that better prediction results can be obtained by combining these above prediction methods organically.
In this study, we introduced a novel calculating approach called NMGMDA to predict latent associations between microbes and drugs, which is based on the nuclear norm minimization 23 and the graph attention network 24 .Figure 1 depicts the NMGMDA structure.These are our primary contributions, in brief: • A novel heterogeneous network made up of microbes and drugs has been created by combining the microbe similarity network, drug similarity network, and existing microbe-drug relationships.• To get projected scores for potential microbe-drug associations, we used both the nuclear norm minimiza- tion (NNM) approach and the GAT-based auto-encoder.And then weighted averaged these two predicted scores to get the final results.• Experimental results and case studies demonstrated the significant prediction performance of NMGMDA on both the MDAD and the aBioflm Databases.

Data sources
In this study, we assessed NMGMDA on the following two databases in order to show its efficacy.MDAD database is a database of microbe-drug associations that was assembled and arranged by Sun et al. 12 in 2018 from a variety of drug-related databases, including TTD and DrugBank, as well as a substantial body of literature.After superfluous data is eliminated, 1373 drugs and 173 microbes were found to have 2470 microbe-drug associations.www.nature.com/scientificreports/ABiofilm database was created by Rajput et al. 13 , which includes 5027 antifungal drugs that target 140 microbes that were identified between 1988 and 2017.Following the removal of redundant data, 140 microbes and 1720 drugs were included in 2884 microbe-drug associations.
Table 1 provides specific statistics of microbes-drugs associations in the MDAD and aBioflm.

Microbe-drug adjacency matrix
We initially create an adjacency matrix A ∈ R n d ×n m , where n d and n m represent the number of drugs and microbes, respectively, based on these microbe-drug associations.A ij equals 1 if there is a known relationship between the drug d i and microbe m j , else it equals 0.

Drug/microbe Gaussian kernel similarity
The following formula will be used to determine the Gaussian kernel similarity D GIP d i , d j ∈ R n d ×n d between d i and d j , assuming that d i and d j are two drugs.
where �A(d i ) − A d j � is the Euclidean distance between two drugs.Since γ d is a regular parameter, it is easier to group together similar feature points the greater γ d .And the definition of γ d is as follows: Similarly, we would calculate the Gaussian kernel similarity M GIP m i , m j ∈ R n m ×n m between two microbes:

Microbe/Drug functional similarity
In the STRING 25 database, we can find many gene functional networks connected to microbes.A matrix M F ∈ R n m ×n m can be produced by the Kamneva 26 tool, which determines microbe functional similarity based on microbial gene families.The SIMCOMP2 tool 27 uses the chemical and molecular formula structures of drugs to determine how similar their structures are.To create a drug functional similarity matrix D F ∈ R n d ×n d , we adopt the similarity scores.

Drug/microbe integrated similarities
It is important to note that not every drug can determine functional similarity.As a result, using the drug structural similarity and the drug Gaussian kernel similarity, we were able to construct a new matrix D ∈ R n d ×n d of integrated drug similarities.
where D GIP is the drug Gaussian kernel similarity, and D F is the drug functional similarity.
Similarly, the microbe integrated similarities matrix M ∈ R n m ×n m was calculated as follows: where M GIP is the drug Gaussian kernel similarity, and M F is the drug functional similarity.
(1) The microbe-drug adjacency matrix, drug integrated similarities matrix and microbe integrated similarities matrix can be joined together to form a whole matrix N ∈ R (n d +n m) ×(n d +n m) : where A T represents A′ s transposition.

Predicting microbe-drug associations by NNM
Currently, the convex optimization model includes nuclear norms, which are applied in many fields 28 .It has a globally optimal solution 11 .Therefore, the nuclear norm minimization of the heterogeneous network N can be expressed as: where �N� * represents the nuclear norm of N , is a set of known positions of elements.
We need to add restrictions to the model to make sure that the unknown elements fall within the range [0,1] since predicted scores for microbe-drug associations should be between [0,1].This forecasting method is: They are ε , which stands for measurement noise, � • � F , which stands for the Frobenius norm, and p , which stands for the orthogonal mapping acting on .Then substituting regularized models for inequality constrained models: where α is a variable that is learnable.The model can be optimized in the manner shown below by introducing the auxiliary matrix X, which was inspired by literature 29 : Then, minimize the enhanced Lagrange function to solve the problem: where Y is the Lagrange multiplier and β > 0 is the penalty factor.
Following that, implement iterative solution.The matrix X k+1 must first be calculated: The best answer to the Eq. ( 15) for arg min that: Update the matrix N k+1 and correct other variables: where ϑ τ (x) is singular value contraction operator, θ i is the singular values of X which is larger than τ , while µ i and ν i are the left and right singular vectors corresponding to θ i .
We can update the Lagrange multiplier Y k+1 as follows by adjusting other variables: Finally, the following information can be found in the prediction matrix A 1 for microbe-drug associations:

Predicting latent microbe-drug associations by GAT
With the introduction of an attention-based design, the graph spatial network GAT performs node categorization for graph-structured data 24 .To determine the matrix N 's structure, we created a GAT model.First determines the attention score between any two nodes in the matrix N: where N i stands for the total number of nodes,a is an attention coefficient, W is a learnable linear transformation, and h i represents the feature vector of the node i , µ is the hypermeter and || denotes the concatenation.Consequently, each node's ultimate output feature is: The activation function, relu , is defined as follows: ×l is produced by substituting N into the previ- ously mentioned GAT model, where X d and X m , respectively, stand in for the drug nodes and microbial nodes in N .After a number of testing, we ultimately decided on MSE loss as the loss function for optimizing our model.An improved random walk with restart (RWR) is implemented on D in response to literature 20 , allowing us to obtain a new matrix.Below is how the RWR was described: where ε i is the initial probability vector, X is the matrix of transition probabilities, and is the restart prob- ability.Similar to that, we might produce a novel matrix M Z by using the enhanced RWR on M.
As a result, by combining the drug matrix X d , D F , D Z and adjacency matrix A , influenced by literature 22 , we could create a new drug feature matrix Z d that looked like this: Similarly, we could create the following new microbe feature matrix: Finally, we employ dot product to derive a microbe-drug association predictive score where swich is an activation function, β , a learnable parameter, which is typically set it to 1, Z d (d i ) indicates the i th row of Z d and Z m m j represents the j th row of Z m .

Final predicted score of microbe-drug associations
The weighted arithmetic mean approach can be used to combine the prediction matrix A 1 acquired through NNM and the prediction matrix A 2 generated through GAT, resulting in the following final forecast matrix A * of microbe-drug associations: where is the weight value.

Experiments and results
In this section, we first carried out sensitive parameter analysis to get the optimum performance out of the model.Then, six state-of-the-art methods would be picked to contrast with NMGMDA.Finally, in order to confirm the validity of our model, we have chosen two typical microbes and drugs, respectively.

Parameter sensitivity analysis
Three pieces make up the NMGMDA model.α and β in formula ( 14) are two crucial parameters in NNM.Dimension l and learning rate l r are the two most important factors in GAT.The weight value is an important parameter in the final prediction formula (32).In this section, to find the appropriate settings and ensure the independence of the training sets and test sets, we initially Randomly picked 20% of the associations are known and 20% are unknown for the training sets, with the remaining sets being test sets.Next, we utilized fivefold CV experiments with the MDAD database and ensure each of the experiments is independent.
In NNM, we decided to conduct joint tests and altered α and β from {0.1, 1, 10, 100, 1000} and conduct joint experiments.Then, using a fivefold CV experiment, we determined the area under curve (AUC) and the area under the precision-recall curve (AUPR) of these parameter combinations.The findings are displayed in Table 2. Table 2 shows that the AUC and AUPR outcomes obtained by NMGMDA are both at their best when α and β have values of 100 and 1, respectively.
Figure 2 makes it clear that no substantial changes to the outcome were caused by changing any particular factors.We choose 32 as the dimension of node topological representation l since it has a little better AUPR value than 64 or 128 dimensions.In line with typical learning models, the learning rate l r was set at 0.01.
Finally, the results are displayed in Fig. 3 for parameter in formula (32), where we estimate the impact of the altered from {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} for the fivefold on MDAD.Which makes it clear that NMGMDA, with set to 0.7, may get the maximum AUC and AUPR values.
After comparing the performance on different hyperparameters by testing, the final parameters we selected are α = 100, β = 1, l = 32, l r = 0.01 and = 0.7.

Comparison with advanced methods
In this case, taking into account the dearth of microbial drug association prediction methods, we would first contrast NMGMDA with a few standard approaches for link prediction issues, such as HMDAKATZ 14 , HMDA-Pred 30 , LAGCN 31 , MNNMAD 32 and GSAMDA 20 , etc.
Here, considering the limited availability of microbial drug association prediction methods, we would first compare NMGMDA with some representative methods for link prediction problems such as HMDAKATZ 14 , HMDA-Pred 30 , LAGCN 31 , MNNMAD 32 and GSAMDA 20 , etc.One of them, HMDAKATZ, predicted the association between microbes and drugs using the KATZ algorithm as a foundation.For the prediction of microbe-disease associations, HMDA-Pred is a novel computer model based on multi-data integration and network consistency projection.LAGCN is a complete end-to-end graph based deep learning method, which forecast the associations between drugs and diseases.By using a Matrix Nuclear Norm approach on data on known microbes and diseases, MNNMAD is a method for predicting microbe-disease relationships.Based on (32) We tested these techniques using their default settings and compared them using the fivefold CV experiment.AUC and AUPR values are used as indicators to evaluate the performance of NMGMDA, and the database we utilize is MDAD and aBioflm.The outcome was displayed in Table 3 and Fig. 4. Our suggested NMGMDA model has the greatest prediction performance of all the methods.

Case study
To test the NMGMDA model's real prediction power, we chose two well-known drugs-Ciprofloxacin and Moxifloxaxin-as well as two common microbes-Human immunodeficiency virus type 1 and Mycobacterium tuberculosis-for case studies.
Ciprofloxacin is an organic molecule with excellent bactericidal effect and broad-spectrum antibacterial activity 33 .It has shown to be a successful treatment for both acute and chronic urinary tract infections, as well as a variety of systemic infections 34 .Staphylococcus aureus 35 , Haemophilus influenzae 36 and Stenotrophomonas maltophilia 37 are all susceptible to its antibacterial properties.Based on the predicted score, ranked the Ciprofloxacin-related microbes scores from highest to lowest, and chose the top 20 microbes for validation after deleting the 10 associations that are currently on MDAD database.As indicated in Table 4, 19 of the top 20 anticipated microbes connected to Ciprofloxacin have been verified by published research in PubMed.Moreover, Moxifloxacin 39 belongs to the quinolone drugs class, which mostly used to treat infections of the skin and soft tissues in adults as well as upper and lower respiratory tract infections 38,39 .According to the literature 40 , Moxifloxacin is an effective treatment for Stenotrophomonas maltophilia keratitis.As indicated in Tables 5, after removing the 4 known associations on MDAD database, we discovered 17 microbes that had been verified by PubMed literature among the top 20 predicted microbes associated with moxifloxacin.
Regarding microbes, the first microbe is Human immunodeficiency virus type 1 (HIV-1), which is a virus capable of attacking the immune system in humans, and causes AIDS, an extremely dangerous infectious illness 41 .HIV-1 has been widely studied in relation to various medicines.Saquinavir, for instance, has been shown to be an effective treatment for HIV-1-infected individuals who have diarrhea and/or wasting syndrome by Hervé Trout 42 .According to literature 43 , the first-line protease inhibitor that is generally suggested in the initial treatment regimen for people with HIV-1 infection is lopinavir/ritonavir.After removing the 26 known associations on MDAD   44 , and many microbes, including ciprofloxacin 45 and triclosan 46 , have been shown to be associated with it.After removing the 14 known associations on MDAD database, Table 7 indicates that of the top 20 candidate drugs, 14 were linked to Mycobacterium tuberculosis.
In conclusion, these two sets of case studies further demonstrate how the NMGMDA model may anticipate the association between microbes and drugs.

Figure 2 .
Figure 2. The AUC and AUPR values on different dimension of node topological representation and learning rate on MDAD database.

Figure 3 .
Figure 3.The AUC and AUPR values on different weight value on MDAD database.

Figure 4 .
Figure 4. ROC curves based on the MDAD database for six competitive methods.

0 Table 1 .
The specific statistics of microbes-drugs associations in the MDAD and aBioflm.

Table 2 .
The AUC and AUPR values on different α and β on MDAD database.
α a graph attention network and sparse auto-encoder, GSAMDA offered a unique computer model for forecasting probable microbe-drug interactions.

Table 3 .
The AUCs and AUPRs of compared methods based on databases MDAD and aBioflm under fivefold CV.

Table 4 .
The top 20 Ciprofloxacin associated candidate microbes on MDAD.The top 10 microbes are listed in the first column, while the top 11-20 microbes are listed in the third column., we discovered 16 (see Table6) drugs that had been validated by PubMed literatures among the top 20 anticipated microbes associated with Human immunodeficiency virus type 1. Mycobacterium tuberculosis is the second microbes used in the case study.Mycobacterium tuberculosis is the pathogen that causes tuberculosis database

Table 5 .
The top 20 Moxifoxacin associated candidate microbes on MDAD.The top 10 microbes are listed in the first column, while the top 11-20 microbes are listed in the third column.

Table 6 .
The top 20 Human immunodeficiency virus type 1 associated candidate drugs on MDAD.The top 10 drugs are listed in the first column, while the top 11-20 drugs are listed in the third column.

Table 7 .
The top 20 Mycobacterium tuberculosis associated candidate drugs on MDAD.The top 10 drugs are listed in the first column, while the top 11-20 drugs are listed in the third column.